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1 Information retrieval: Knowledae-based extraction of named entities 
Jamie Calian, Teruko Mitamura 

November 2002 Proceedings of the eleventh international conference on Information 
and knowledge management 

Publisher: ACM Press 

Full text available: "^ pclft 124.65 KB) Additional Information: full citation , abstract , references . Index terms 

The usual approach to named-entity detection is to learn extraction rules that rely on 
linguistic, syntactic, or document format patterns that are consistent across a set of 
documents. However, when there is no consistency among documents, it may be more 
effective to learn document-specific extraction rules.This paper presents a knowledge- 
based approach to learning rules for named-entity extraction. Document-specific 
extraction rules are created using a generate-and-test paradigm and a database ... 

Keywords: named-entity extraction 



An automated approach for retrieving hierarchical data from HTML tables 
Seung-Jin Lim, Ylu-Kai Ng 

November 1999 Proceedings of the eighth international conference on Information 
and knowledge management 

Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings , index 
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li^^J-'^ terms 

Among the HTML elements, HTML tables [RH398] encapsulate hierarchically structured 
data (hierarchical data in short) in a tabular structure. HTML tables do not come with a 
rigid schema and almost any forms of two-dimensional tables are acceptable according to 
the HTML grammar. This relaxation complicates the process of retrieving hierarchical data 
from HTML tables. In this paper, we propose an automated approach for retrieving 
hierarchical data from HTML tables. The proposed approach constr ... 

Document Formattinc Systems: Survey. Concepts, and Issues 
Richard Furuta, Jeffrey Scofield, Alan Shaw 

September 1982 ACM Computing Surveys (CSUR), Volume 14 issue 3 
Publisher: ACM Press 

Full text available: ^ pdf(5.36 MB) Additional Information: full citation , references , citings , index terms 
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4 JANUS: An interactive system for document composition Q 
% Donald D. Chamberlin, James C. King, Donald R. Slutz, Stephen J. P. Todd, Bradford W. Wade 

^ June 1981 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN SIGOA 
symposium on Text manipulation, volume 16 issue 6 
Publisher: ACM Press 

Full text available- f^ pdfn.OQMB) Additional Information: full citation , abstract, references, dtings. index 

^•^^^^^^''^ terms 

This paper describes the architecture of a proposed document composition system named 
JANUS, which is intended to provide support for authors of complex documents containing 
mixtures of text, line art, and tone art. The JANUS system is highly interactive, providing 
authors with immediate feedback and direct electronic control over page layouts, using a 
special two-display workstation. Authors communicate with the system by marking up 
their documents with high-level descriptive ''tags&r ... 

5 Information access and retrieval: Structured information retrieval in XML documents H 
Evangelos Kotsakis 

^ March 2002 Proceedings of the 2002 ACM symposium on Applied computing 

Publisher: ACM Press 

Additional Information: full citation , abstract , references , dtinas . index 



Full text available: \ 

^^'■'^"^ terms 

Query languages that take advantage of the XML document structure already exist. 
However, the systems that have been developed to query XML data explore the XML 
sources from a database perspective. This paper examines an XML collection from the 
viewpoint of Information Retrieval (IR). As such, we view the XML documents as a 
collection of text documents with additional tags and we attempt to adapt existing IR 
techniques to achieve more sophisticated search on XML documents. We employ a class of 
q ... 

Keywords: XML information retrieval, full text searching, semistructured data indexing, 
web data indexing 



Re-engineering structures from Web documents 
Chuang-Hue Moh, Ee-Peng Lim, Wee-Keong Ng 

June 2000 Proceedings of the fifth ACM conference on Digital libraries 
Publisher: ACM Press 

.. * ^ •• ui 01 ^/^on nc UD\ Additional Information: full citation , abstract , references , dtinas . Index 

Full text available: p^ pdf(180.95 KB) ^ 

^ terms 

To realize a wide range of applications (including digital libraries) on the Web, a more 
structured way of accessing the Web is required and such requirement can be facilitated 
by the use of XML standard. In this paper, we propose a general framework for reverse 
engineering (or re-engineering) the underlying structures i.e., the DTD from a collection of 
similarly structured XML documents when they share some common but unknown DTDs. 
The essential data structures and algorithms for ... 

Keywords: Web information discovery, XML 



7 A fine-grained access control system for XML documents 

Ernesto Damiani, Sabrina De Capitani di Vimercati, Stefano Paraboschi, Pierangela Samarati 
^ May 2002 ACM Transactions on Information and System Security (TISSEC), volume 5 

Issue 2 
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Publisher: ACM Press 

.. * ^ •• ui ^ -i*/ooA fin iyn\ Additional Information: full citation , abstract , references , citings , index 

Full text available: Wi Ddf(330.60 KB) ; 

i^^-i-'^ terms 

Web-based applications greatly increase information availability and ease of access, which 
is optimal for public Information. The distribution and sharing of information via the Web 
that must be accessed in a selective way, such as electronic commerce transactions, 
require the definition and enforcement of security controls, ensuring that information will 
be accessible only to authorized entities. Different approaches have been proposed that 
address the problem of protecting Information In a Web ... 

Keywords: Access control, World Wide Web, XML documents, authorizations specification 
and enforcement 



8 Opentaq and TMX: XML in the localization industry B 

William Burns, Walter Smith 
^ September 1998 Proceedings of the 16th annual international conference on 
Computer documentation 

Publisher: ACM Press 

Full text available: pdff443.96 KB) Additional Information: full citation , references , index terms 



9 INFO: a simple document annotation facility H 

# Scott Tilley, Hausi Miiller 
October 1991 Proceedings of the 9th annual international conference on Systems 
documentation 

Publisher: ACM Press 

Full text available: ^pdf(619.22 KB) Additional Information: full citation , references , citings , index terms 



10 XML and text: XRANK: ranked keyword search over XML documents H 
^ Lin Guo, Feng Shao, Chavdar Botev, Jayavel Shanmugasundaram 

^ June 2003 Proceedings of the 2003 ACM SIGMOD international conference on 
Management of data 
Publisher: ACM Press 

.- .. * ^ .. u, ^..occ QQ i^DN Additional Information: full citation , abstract , references , citings, index 
Full text available: 'g | pdf(265,38 KB) ^^^^^ 

We consider the problem of efficiently producing ranked results for keyword search 
queries over hyperllnked XML documents. Evaluating keyword search queries over 
hierarchical XML documents, as opposed to (conceptually) flat HTML documents, 
introduces many new challenges. First, XML keyword search queries do not always return 
entire documents, but can return deeply nested XML elements that contain the desired 
keywords. Second, the nested structure of XML implies that the notion of ranking is no 
I ... 

11 Document architecture and text formatting H 
^ Arno J. H. Peels, Norbert J. M. Janssen, Wop Nawijn 

October 1985 ACM Transactions on Information Systems (TOIS), volume 3 issue 4 

Publisher: ACM Press 

.-I. * * I ui 01 KiiDx Additional Information: full citation , abstract , references , dtings, index 

Full text available: ' ^pdf(1.67 MB) terms 

The formalization of the architecture of documents and text formatting are the central 
issues of this paper. Besides a fundamental and theoretical approach toward these topics, 
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an overview is presented of the COBATEF system. The COBATEF system is a context- 
based text formatting system, for which a software, as well as a hardware, 
implementation is available. A unique feature of the system is its automatic text-element 
recognition mechanism, which is context based and consequently ... 

12 Trigger-pair predictors in parsing and tagging \ 
Ezra Black, Andrew Finch, Hideki Kashioka 

August 1998 Proceedings of the 17th international conference on Computational 
linguistics - Volume 1 , Proceedings of the 36th annual meeting on 
Association for Computational Linguistics - Volume 1 

Publisher: Association for Computational Linguistics , Association for Computational Linguistics 

Full text available: ^pdf(696.41 KB) ^ ....... ^ x 

M Additional Informaton: full citation , abstract , references 

^ Publisher Site 

In this article, we apply to natural language parsing and tagging the device of trigger-pair 
predictors, previously ennployed exclusively within the field of language modelling for 
speech recognition. Given the task of predicting the correct rule to associate with a parse- 
tree node, or the correct tag to associate with a word of text, and assuming a particular 
class of parsing or tagging model, we quantify the information gain realized by talcing 
account of rule or tag trigger-pair predictors, i.e ... 

13 Structured answers for a large structured document collection I 
Michael Fuller, Eric Maclcie, Ron Sacks-Davis, Ross Wilkinson 

July 1993 Proceedings of the 16th annual international ACM SIGIR conference on 

Research and development in information retrieval 
Publisher: ACIVI Press 

^ Hx ^ •. 01 -ix/Hnofciinx Additional Information: full citation , abstract, references , citings. Index 
Full text available: ^ pdf(1.09 MB) ^^^^^ 

There is a simple method for integrating information retrieval and hypertext. This consists 
of treating nodes as isolated documents and retrieving them in order of similarity. If the 
nodes are structured, in particular, if sets of nodes collectively constitute documents, we 
can do better. This paper shows how the formation of the hypertext, the retrieval of nodes 
in response to content based queries, and the presentation of the nodes can be achieved 
in a way that exploits the knowledge enco ... 

14 Is universal document exchange in our future? 
Louis M. Gomez, Donald F. Pratt, Mark R. Buckley 

October 1988 Proceedings of the 6th annual international conference on Systems 
documentation 

Publisher: ACM Press 

Full text available: 'Pi pdf(824.12 KB) Additional Information: full citation , index terms 



15 Poster papers: Discovering informative content blocks from Web documents 

Shian-Hua Lin, Jan-Ming Ho 

July 2002 Proceedings of the eighth ACM SIGKDD international conference on 

Knowledge discovery and data mining 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings , index 



Full text available: — . — tems 

In this paper, we propose a new approach to discover informative contents from a set of 
tabular documents (or Web pages) of a Web site. Our system, InfoDiscoverer, first 
partitions a page into several content blocks according to HTML tag <TABLE> in a Web 
page. Based on the occurrence of the features (terms) In the set of pages, It calculates 
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entropy value of each feature. According to the entropy value of each feature in a content 
block, the entropy value of the block is defined. By analyz ... 

Keywords: entropy, information extraction, information retrieval, informative content 
discovery 



16 Workshop: Documenting software systems with views II: an integrated approach Q 
^ based on XML 

^ Jochen Hartmann, Shihong Huang, Scott Tilley 

October 2001 Proceedings of the 19th annual international conference on Computer 
documentation 

Publisher: ACM Press 

M ^ ^ . u. isi^ ^*/-7oi: 7-1 tyox Additional Information: full citation , abstract , references , citings . Index 
Full text available: 'm \D6^(7S5.7^ KB) ^ 

terms 

Software engineers rely on program documentation as an aid in understanding the 
functional nature, high-level design, and implementation details of complex applications. 
Without such documentation, engineers are forced to rely solely on source code. This is a 
time-consuming and error-prone process, especially when one considers the amount of 
information assimilation and domain mapping that is required to understand the 
architecture of a large-scale software system. This paper describes an integr ... 

Keywords: MSR MEDOC, XML, reverse engineering, software documentation 



17 Interactive mathematics via the Web using MathML 
j^. Francis 3. Wright 

^ June 2000 ACM SIGSAM Bulietia Volume 34 Issue 2 
Publisher: ACM Press 

Full text available: 'g! ] pdf(1.07 MB) Additional Information: full citation , abstract , index terms 

l^athML is a mathematical markup language intended for displaying mathematics in web 
browsers. At present, it can be used to display mathematics generated dynamically in 
response to interactive queries only if the browsing and generating facilities are chosen 
carefully. This paper examines the background and possible options, and describes some 
of the details of the use of MathML to display the output from a web-based demonstration 
of an ordinary differential equation solver running in REDUCE ... 
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Source code is, among other things, a text to be read. In this paper I argue that reading 
source code is a key activity in software maintenance, and that we can profitably apply 
experiences and reading systems from text databases to the problem of reading source 
code. Three prototype systems are presented, and the main features of their design are 
discussed. 
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An Embedded Web Server &lpar;EWS&rpar; is a Web server which runs on an embedded 
system with limited computing resources to serve embedded Web documents to a Web 
browser. By embedding a Web server into a network device, it is possible to provide a 
Web&hyphen; based management user interface, which are user&hyphen;friendly, 
Inexpensive, cross&hyphen; platform, and network&hyphen;ready. This article explores 
the topic of an efficient and lightweight embedded Web server for Web&hyphen; based 
netw ... 
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